Finite size scaling in neural networks 
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We demonstrate that the fraction of pattern sets that can be stored in single- and hidden-layer 
perceptrons exhibits finite size scaling. This feature allows to estimate the critical storage capacity 
etc from simulations of relatively small systems. We illustrate this approach by determining a c , 
together with the finite size scaling exponent v, for storing Gaussian patterns in committee and 
parity machines with binary couplings and up to K = 5 hidden units. 
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Finite size scaling (FSS) has proven to be a power- 
ful method for analyzing phase transitions, which occur 
rigorously only in the thermodynamic limit, using simu- 
lations of systems of finite size [[!]. In particular, it has 
become the prime method for determining numerical val- 
ues of critical coupling parameters and exponents 0]. 

Phase transitions arc known to occur not only in con- 
densed matter [|| and percolation systems 0, but also 
in random graphs 0], neural networks and in algo- 
rithmic problems like search H and the satisfiability of 
random boolean expressions M. Heuristic derivations of 
FSS rely on the divergence of a correlation length at a 
critical point in the infinite system ^|J|]. However, Kirk- 
patrick and Selman |j] have demonstrated recently that 
FSS can be used efficiently also in problems without any 
intrinsic length scales, like the connectivity of random 
graphs and the satisfiability of random boolean expres- 
sions. Abstract neural networks [|| are another class of 
systems without intrinsic length scale, and we will show 
in this contribution that FSS occurs at the transition 
from storable to unstorable pattern set sizes, and that it 
provides a powerful computational method for determin- 
ing critical storage capacities. 

We will concentrate on particular feed-forward net- 
works of the perceptron class, namely multi-layer per- 
ceptrons with N input neurons, K hidden units, and a 
regular tree-like connectivity (N mod K = 0), see Fig. [|, 
which are also known as committee and parity machines 
(CM, PM) with non-overlapping receptive fields [jio] |l2| . 
Input patterns k = 1, . . . , K, i — 1, . . . , N/K, are 
processed by the following rules: The output of hidden 
layer cell k is given by 
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where in the case of a CM the majority rule is imple- 
mented by = ^2, while in the case of a PM = fj. 
A standard single-layer perceptron corresponds to K = 1 . 
Since the majority rule is somewhat problematic in case 
of even K, we will restrict ourselves here to CM with K 
odd. 

A perceptron is able to store a particular set of input 
patterns £t = 1> • • • iPi if there exists a coupling set 

{ Jik} such that - under the action of Eqs. (0^) - a pre- 
scribed set of outputs {O m } is generated. It is well known 
that for small values of a = p/N such a set of couplings 
can always be found, while for large enough a the prob- 
ability for its existence vanishes. For finite systems the 
fraction of all possible input-output relations 
of relative size a that can be stored, which we will call 
P(a,N) g|, under goes a smooth transition from one to 
zero. However, in the infinite system it switches from one 
to zero at the critical storage capacity a c . 

This behavior, together with FSS, is nicely illustrated 
for the single-layer perceptron with continuous couplings 
and the drawn from a Gaussian distribution, where 
the exact solution for P(a, N) is known analytically 
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Jik being the coupling between input cell ik and hidden 
unit k, while the final output is determined by 



Figure || (top) shows P(a, N) for various values of N, 
The common intersection of these curves at a = 2 is no- 
ticed immediately. Also, the steepness of the transition 
increases with system size N. 

Under FSS, systems of different size behave in an 
identical way near the transition under a size-dependent 
rescaling of the control parameter M , 



y = (a - a^N 1 /" 



(4) 
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Necessarily, the common intersection of the transition 
curves observed above corresponds to the critical storage 
capacity a c . Figure || (bottom) shows that a rescaling 
with v = 2 and a c — 2 indeed lets all transition curves 
fall onto a single scaling curve. In this particular case, 
the numerical value of the FSS exponent v, together with 
the analytic form of the scaling function, 



(5) 



can be derived from the asymptotic behavior of Eq. (^|) , 
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Figure || demonstrates moreover that critical storage ca- 
pacity a c and FSS exponent v can already be estimated 
from systems of relatively small size. 

Simulations of neural networks are plagued by the 
problem that learning algorithms ||, necessary to de- 
termine coupling sets that solve the storage problem, are 
not guaranteed to reach a solution practically, i.e. under 
realistic time constraints, even if it exists. Close to a c the 
average learning time diverges a behavior remind- 
ing of critical slowing down 0. The situation is worse 
for systems with binary couplings, since there the usual 
learning algorithms are not applicable |l9| . 

We will concentrate in the following on perceptrons 
with binary couplings Jik = ±1, also known as Ising per- 
ceptrons. Employing complete enumeration of the cou- 
plings for systems up to size TV = 30, simulation results 
independent of any learning algorithm are obtained. We 
used Gaussian patterns for the results presented in this 
contribution. Note that for binary coupling perceptrons 
with a finite number of hidden units information theory 
gives an upper limit for the critical storage capacity of 
one, i.e. a c < 1 

Figure || (top) shows simulation results for P(a, N) 
for the case of a single-layer binary coupling pcrceptron. 
Sets of input-output relations were classified as storable 
or unstorable by complete enumeration of the coupling 
space lH^j. Each data point was sampled with about 10 3 
randomly chosen sets of input-output relations, giving a 
relative error of about 3%. As in the case of continuous 
couplings, Fig. 0, the curves for various system sizes inter- 
sect at the critical storage capacity, here with the numer- 
ical value a c w 0.8. Figure ^ (bottom) shows the same 
data under rescaling with Eq. (Ji|) and v rj 1.7. Again, 
all data points fall onto one scaling curve. Note that the 
value of the scaling function at the transition, /(0) « 0.7, 
is different from the continuous case (/(0) = 0.5). 

Results for the hidden-layer systems of parity and com- 
mittee type show a behavior qualitatively similar to the 
one presented in Fig. ^ for the single-layer perceptron. 
We have collected our results for various values of K in 
Table EL As it is to be expected, a c increases with the 



introduction of a hidden layer of neurons. The FSS ex- 
ponent v decreases with increasing K, to about 1.3 and 
1.2 for CM, and to values around one for PM. 

The most surprising results are those for PM. Already 
a system with K = 2 hidden units exhibits a storage 
capacity extremely close to the theoretical limit, and 
Table |] shows that there is practically no improvement 
in increasing K. In application situations, storing pat- 
terns has to be done using finite size perceptrons. Since 
the FSS scaling function f(y) describes the asymptotic 
behavior of the fraction of storable patterns, P(a,N), 
around a c , the critical capacity has to be considered to- 
gether with f(y) when assessing the quality of a partic- 
ular system. Note that f(y) decreases considerably with 
K in the critical region for PM as well as CM, see /(0) in 
Table |[ These features suggest that a PM with K = 2 is 
already the best practical binary perceptron for storing 
continuous patterns. 

Simulation studies of the single-layer binary percep- 
tron have been performed before for the problems of stor- 
ing binary @-|9]j2|@ , and Gaussian patterns US 
using various approaches and not always leading to con- 
clusive results. Our result for a c differs significantly from 
the analytical result of Ref. [£5| (a c = 0.833) obtained us- 
ing a first order replica symmetry breaking ansatz (RSB), 
but could be considered compatible - within error bars - 
with the simulation result of Ref. §| ("a c « 0.82" §6|). 
This discrepancy between the analytical approximation 
and our simulation result suggests - provided finite size 
scaling holds - that the first order RSB is still insufficient 
for a correct analytical treatment of the K = 1 case, de- 
spite the claims in p5[ . For binary CM and PM storing 
Gaussian patterns no analytical or simulation results are 
available at present, to the best of our knowledge. 

It has been hypothesized on the basis of replica stud- 
ies [M that the storage capacity for binary and Gaus- 
sian patterns is identical. Previous simulation results 
for K = 1 seemed to be compatible with this hypoth- 
esis and with the RSB result reported above (a c = 0.83 
jjL6|, a c = 0.833 0-|lf], however ||). Since our re- 
sults differ significantly from the RSB result, this casts 
some doubt on either this hypothesis, the RSB result, or 
on the interpretation of the simulation results f2q| . For 
the case of storing binary patterns in CM, simulation re- 
sults using complete enumeration have been obtained for 
K = 3 in |12] , together with analytical results for K = 3 
("a c w 0.92"), and for K -> oo ("a c « 0.95"), using 
a replica symmetric (RS) ansatz. Although our simu- 
lation results for CM differ somewhat, they can still be 
considered statistically compatible with those values, in 
contrast to the K = 1 case discussed above. This result 
supports the hypothesis of [jl2| that a RS ansatz might 
be sufficient for CM, and suggests that the hypothesis of 
an identical a c for storing binary and Gaussian patterns 
might hold at least for CM. 
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In closing, we like to draw again attention to the fact 
that the values for /(0) differ strongly between various 
perceptrons. In particular, with the single exception of 
the CM with K = 3, they differ considerably from 1/2. 
On the other hand, the relation P(ao,s(N), N) — 0.5 has 
often been the basis of an extrapolation to the infinite 
system critical parameter from simulations of finite sys- 
tems [^7 23 . If we define yo.5 by /(j/0.5) = 0.5, then 

a . 5 (N) = a c + y .*,N- 1 ' v . (7) 

Together with the fact that the FSS exponent v deviates 
from one particularly for K = 1 and for CM, this fea- 
ture emphasises the need for an extrapolation nonlinear, 
instead of linear, in 1/iV to correctly obtain the thermo- 
dynamic limit value of ao^(N) |28|, and it may be the 
source of some problems encountered in earlier simula- 
tion studies [§2]]2|]. 

The above results demonstrate that the FSS ansatz not 
only offers a new and powerful computational approach 
for evaluating the critical storage capacities of binary per- 
ceptrons, but also allows a detailed view on the storage 
properties in the critical region. We believe that it will 
prove valuable in analyzing the properties of a wide va- 
riety of binary perceptron topologies. 
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